An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
本文介绍了Omnivl,这是一种新的基础模型,旨在使用一种通用体系结构来支持图像语言和视频语言任务。它为图像和视频输入采用了统一的基于变压器的视觉编码器,因此可以执行联合图像语言和视频语言预处理。我们首次证明了这样的范式受益于图像和视频任务,而不是传统的单向传输(例如,使用图像语言来帮助视频语言)。为此,我们提出了对图像语言和视频语言的脱钩关节预处理,以有效地将视觉模型分解为空间和时间维度,并在图像和视频任务上获得性能提升。此外,我们引入了一种新颖的统一视觉对比度(UNIVLC)损失,以利用图像文本,视频文本,图像标签(例如,图像分类),视频标签(例如,视频动作识别)在一起受到监督和吵闹的监督预处理数据都尽可能多地利用。无需额外的任务适配器,Omnivl可以同时支持仅视觉任务(例如,图像分类,视频操作识别),跨模式对齐任务(例如,图像/视频 - 文本检索)和多模式理解和生成任务(例如,图像/视频问答,字幕)。我们在各种下游任务上评估Omnivl,并以相似的模型大小和数据量表获得最新的或竞争结果。
translated by 谷歌翻译
通过利用未标记的对话框数据来开发半监督的面向任务的对话框(TOD)系统已吸引了越来越多的兴趣。对于对潜在状态TOD模型的半监督学习,经常使用变异学习,但遭受了通过离散潜在变量传播的梯度的令人讨厌的高度变化,以及间接优化目标对数的弊端。最近,一种称为关节随机近似(JSA)的替代算法已出现,用于学习具有令人印象深刻的性能的离散潜在可变模型。在本文中,我们建议将JSA应用于对潜在状态TOD模型的半监督学习,该模型称为JSA-TOD。据我们所知,JSA-TOD代表了开发基于JSA的半监督学习的第一批工作,用于对TOD系统(例如TOD系统)这样的长期顺序生成问题的离散潜在可变条件模型。广泛的实验表明,JSA-TOD明显优于其变异学习对应物。值得注意的是,使用20%标签的半监督JSA-TOD在Multiwoz2.1上的全面监督基线附近。
translated by 谷歌翻译
我们考虑了自主渠道访问(AutoCA)的问题,其中一组终端试图以分布式方式通过常见的无线通道发现具有访问点(AP)的通信策略。由于拓扑不规则和终端的通信范围有限,因此对AutoCA的实用挑战是隐藏的终端问题,在无线网络中臭名昭著,可以使吞吐量和延迟性能恶化。为了应对挑战,本文提出了一种新的多代理深钢筋学习范式,该学习范式被称为Madrl-HT,在存在隐藏码头的情况下为Autoca量身定制。 MADRL-HT利用拓扑见解,并将每个终端的观察空间转变为独立于终端数量的可扩展形式。为了补偿部分可观察性,我们提出了一种外观机制,以便终端可以从载体感知的通道状态以及AP的反馈中推断出其隐藏终端的行为。提出了基于窗口的全球奖励功能,从而指示终端在学习过程中平衡终端的传输机会,以最大程度地提高系统吞吐量。广泛的数值实验验证了我们的解决方案基准测试的优越性能,并通过避免碰撞(CSMA/CA)方案对旧的载体 - 义值访问。
translated by 谷歌翻译
关系提取和命名实体识别始终被视为需要不同输入数据,标签和模型的两个不同任务。但是,两者都对于结构性情绪分析至关重要。我们认为,两项任务都可以将两个任务组合成具有相同输入数据的单个堆叠模型。我们执行了不同的实验,找到了从单句中提取多个意见元组的最佳模型。意见元组将由持有人,目标和表达组成。凭有意见元组,我们将能够提取我们所需要的关系。
translated by 谷歌翻译
当实体提到可能是不连续的,命名实体识别(ner)仍然挑战。现有方法将识别过程分解为几个顺序步骤。在培训中,他们预测金色中间结果的条件,而推理依赖于前一步的模型输出,这引入了曝光偏差。为了解决这个问题,我们首先构造每个句子的段图,其中每个节点都表示段(其自己的连续实体,或者是不连续实体的一部分),并且边缘链接属于同一实体的两个节点。节点和边缘可以分别在一个阶段中产生网格标记方案,并使用名为MAC的新颖体系结构共同学习。然后,不连续的ner可以被重新重整为发现图中的最大批变并在每个集团中连接跨度的非参数过程。三个基准测试的实验表明,我们的方法优于最先进的(SOTA)结果,在F1上提高了高达3.5个百分点,并在SOTA模型上实现了5倍的加速。
translated by 谷歌翻译
Classification using supervised learning requires annotating a large amount of classes-balanced data for model training and testing. This has practically limited the scope of applications with supervised learning, in particular deep learning. To address the issues associated with limited and imbalanced data, this paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN), in which a conditional generative adversarial network (CGAN) is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process. In this setting, the CGAN not only serves as a co-supervisor but also provides complementary quality examples to aid the classifier training in an end-to-end fashion. Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN (EC-GAN) and a baseline ResNet-18 classifier. For the comparison, all classifiers in above methods adopt the ResNet-18 architecture as the backbone. Particularly, for the Street View House Numbers dataset, using the 5% of training data, a test accuracy of 90.26% is achieved by SEC-CGAN as opposed to 88.59% by EC-GAN and 87.17% by the baseline classifier; for the highway image dataset, using the 10% of training data, a test accuracy of 98.27% is achieved by SEC-CGAN, compared to 97.84% by EC-GAN and 95.52% by the baseline classifier.
translated by 谷歌翻译
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.
translated by 谷歌翻译
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of objects and cameras. In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, \ie, identifying and segmenting object instances within the video. Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank. We employ the well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-augmented matching is further performed. In addition, we introduce a multi-path fusion block to effectively combine the memory readout with multi-scale features from the instance segmentation decoder, which incorporates high-resolution instance-aware features to produce final segmentation results. Our method achieves state-of-the-art performance on DAVIS 2016/2017 val (92.6% and 87.1%), DAVIS 2017 test-dev (82.8%), and YouTube-VOS 2018/2019 val (86.3% and 86.3%), outperforming alternative methods by clear margins.
translated by 谷歌翻译